oocukEyr resume 



ED 299 924 



HE 021 918 



AUTHOR 
TITLE 



INSTITUTION 

PUB DATE 
NOTE 

AVAILABLE FROM 



PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Astm, Alexander V.; And Others 

Three Presentations: From the Third National 

Conference on Assessment in Higher Education. 

(Chicago, Illinois, June 8-ll# 1988). 

American Association for Higher Education, 

Washington, D.C. 

88 

57p.; Document collected as part of th^ American 
Association for Higher Education Assessment Forum. 
AAHE Assessment Forum. American Association for 
Higher Education, One Dupont Circle, Suite 600, 
Washington, DC 20036 ($10.00). 

Viewpoints (120) — Speeches/Conference Papers (150) 
MF01/PC03 Plus Postage. 

Accountability; ^Educational Assessment; Higher 
Education; Incentives; Instructional Effectiveness; 
Multiple Choice Tests; ^Outcomes of Education; 
Productivity; Role of Education; Standardized Tests; 
^Student Development; Testing 

*AAHE Assessment Forum; ^College Outcomes Assessment; 
Florida 



ABSTRACT 

Three presentations from the Third National 
Conference on Assessment in Higher Education are included. In 
"Assessment smd Human Values: Confessions of a Reformed Number 
Cruncher," Alexander W. Astin, focuses on measuring education 
productivity, assessment lessons from the Cooperative Institutional 
Research Program, a talent-development model of excellence, 
assessment and values, multiple-choice tests, holistic methods, 
assessing affective outcomes, and beyond narcissism. It stresses that 
the )cey to achieving institutional tramscendence is ultimately in how 
excellence is defined. "Assessment and Incentives: The Mediiun is the 
Message" (Linda Darling-Hammond) discusses from a teacher's point of 
view the following: how measurement changes behavior; incentives: a 
parable with lessons; the K-12 experience;, limits of sta^idardized 
testing; effects of testing on teaching and learning; and policy 
making and assessment. Important factors are educating those who 
wouJd impose hasty or inadequate methods, an^l insisting on 
intellectual honesty and educational validity. "The Assessment 
Movement: What Next? Who Cares?" (Robert H. McCabe) gives a community 
college president's views on access and standards, the public call 
for accountability, state initiatives in rssessment, the Florida 
experience; and institutional aa^essment initiatives. The assessment 
movement is growing in tandem with the teaching/learning movement and 
can be considered an element of it. The future of assessment is in 
improving student development through more effective teaching and 
learning. (SM) 



Reproductions supplied by EDRS are the best that can be made * 
from the original document. • 



CVJ 

o 

UJ 



Three Presentations: 

FROM THE THIRD NATIONAL CONFERENCE ON 
ASSESSMENT IN HIGHER EDUCATION 

JUNE 8-11. 1988 CHICAGO 



Alexander W. Astin 
Linda Darling-Hammond 
Robert H. McCabe 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



■4 THEAAHE ' 

Assessment 

.FORUM. 



ERIC 



AMERICAN ASSOCIATION FOR HIGHER EDUCATION 



Ont DttMot Click 
Suiit 600 

Wtthifigioii^O.a 20036 
202/293-6440 




AHflUCAN ASSOOAnON 
HIGHER EOUCAHON 



Board of Dtrccton 



Joteph F. Kauflman 
(•'mvenuv ni Wmcoiuiii 



Addt S. Simmoni 
Hampiliirr CoUcfe 

Reaiha Clark Kin% 
Mcifopoitua Stale L'nsvcnny 

Harriet VV, Sheridan 
Brswfi Univemtv 

Carios H. Arce 
NttSttis« Inc. 

Estda M . BensuBon 

TndMnCoUcfr 

- ■ • I u 



Anne L. Bryant 
AoMncan Aitoeiaiion of 
UaivcRMx Women 

Oonaid L. FruehUn$ 
McCnw-HUI. tnc. 

CBen V. Putter 
Barnard CoUetr 

Jerry G. GaiT 
HmflUnc Univenity 

Zeida F. Gamaon 
Uftivcntty of Michigan 

Stephen R. Graubard 



Joseph Katx 

Siaie L'aivcrtiiv oi New York 
M Siony Bruok 

Ai^ur £. Levtne 
Bffadibrd CoUefc 

Frank Newman 
Education Commaation 
of the States 

AlanPUer 

of New York 

W. AimReynolda 
The CdiUbrma Scale 



Pitdad F. Robertion 
Miami Dads ComaHimiy 
Colic |i 

0. WaynoSaby 



ERIC 



P. Mi^ad TbipMW 
'^mbiB Univefoiiy 



The AAHE ASSESSMENT FORUM is a tbree-7e«r project supported 
by Che Fund for Che Isproveinent of Postsecondary Education. 
It entails three distinct but overlapping activities: 

~an annual conference 

(the first scheduled for June 14-17, 1987, in Denver) 

*^conmissioned papers 

(focxised on iaplenentation and other timely assessment 
concerns; available through the Fonom for a small fee) 

**'inforoation services 

(including consultation, referrals, a national directory, 
and more) 

This paper is part of an on-*going assessment collection 
maintained by the Fonmi. We are pleased to make it more 
widely available through Che ERIC system. 

For further information about ASSESSMENT FORDM activities, 
contact Patricia Hutchings, Director, AAHE ASSESSMENT FORUM, 
One Dupont Circle, Suite 600, Washington, DC 20036 



BEST COPY AVAILABLE 



ASSESSMENT AND HUMAN VALUES: 
CONFESSIONS OF A REFORMED NUMBER CRUNCHEK i 



An address by: 

Alexander W. Astin 
UCLA Graduate School 
of Education 



Presented at: 

The AAHE Assessment Forum 
Third National Conference on Assessment 
in Higher Education 
June 8-ll» 1988 
Chicago, Illinois 



Assessment and Human Values: 
Confessions of a Reformed Number-Cruncher'^ 

Alexander W. Astin 
University of California 
Los Angeles 

My talk this morning is going to be a mixture of t chnical stuff on assessment and a little 
philosophy. Since what I have to say will probably make more sense if it also includes some 
autobiographical notes, so let's start first with my graduate education. My doctorate in psychology 
was based on a double major-counseling psychology and quantitative psychology-and to an 
extent this dual emphasis represents two sides of myself that for the past thirty years have been 
struggling to get in balance and in tune with each other. My interest in mental health and 
counseling and psychotherapy is what initially got me into psychology, but in graduate school I 
quickly found that I liked writing and that statistics and research design came very easily to me. 
Research was just plain fun. Counseling and psychotherapy, on the other hand, not only seemed 
to be much more difficult, but I also came to have some serious doubts about whether I was really 
doing my clients and patients very much good 

Although my Hrst postdoctoral job was as a clinical psychologist, I managed to 
find time to do some research and writing. And although my supervisor seemed to feel that I was a 
good clinician, after two years I decided to look for a full-time research position. 

How I eventually ended up in the field of educational research is a complicated story that's 
probably not v^ relevant to our topic today; suffice it to say that I quickly found higher education 
to be a fascinating and challenging field where the problems seemed*-on the surface, at least-to be 
much more tractable than those in the mental health field. 

Measuring Education Productivity 

My first higher education research was concerned with something called "Ph.D. 
productivity." Researchers at Wesleyan University and the University of Chicago (Knapp and 

^Presented at the AAHE Assessment Conference, Chicago, June 8-11, 1988. 



1 



Goodrich, 1952; Knapp and Grccnbaum, 1953) had found that certain colleges were much more 
likely than others to produce graduates who eventually went on to win graduate fellowships and to 
earn the Ph.D. degree. Since the "highly productive" colleges also tended to have larger libraries, 
better student-faculty ratios, and more faculty who themselves had Ph.D.'s, the researchers 
concluded that these superior facilities and resources were somehow responsible for the colleges* 
higher productivity. 

I was working at the National Merit Scholarship Corporation at the time, and noticed that 
the "highly productive" colleges tended to be the same ones that the Merit Scholars preferred to 
attend This fact prompted me to ask a rather simple question: Could a college's "output" of 
Ph.D.s be explained simply in terms of its initial "input" of talented fieshmen? To test this 
possibility we conducted a series of studied which showed that, as far as "Ph.D. output" is 
concerned, the "student input" is by far the most important factor. Indeed, it turned out that, when 
you took student inputs into account, some of the so-called "highly productive" institutions were 
actually underproducing Ph.D.s, whereas some of those with more modest outputs were actually 
producing more than one wouW expect from their student input (Astin, 1961, 1963). 

These early studies werc critical in teaching us three fundamental lessons about assessment in 
higher education. Let me briefly sununarize these lessons for you (Fig 1.). 
1. First, Uie "output" of an institution- whether we 
measure this in temis of how many graduates earn 
advanced degrees, how much money the alumni earn, or 
whatever-doesn't really tell us much about its 
educational isiDSSi or educational effectiveness . 
Rather, outputs must always be evaluated in terms of 
inputs. (Fig. 2) This is a particularly important 
principle for American higher education, given the fact 
that the three thousand institutions in our system differ 
so gready in the kinds of students they enroll. I am 

6 

Notts Figurti rtftr to ilidtii not includtd htrt. 



speaking here» of course, of the need for longitudinal data. 
You can also see the relevance here to the value-added 
concept. 

2. Second, an output measiu^ such as Ph.D. productivity is not 
determined solely by a single input measure such as 
student ability. On the contrary, even in our earliest 
studies of this phenomonon we found that input variables 
such as the student*s sex and major field of study are at 
least as important as ability in detemiining Ph.D. outputs. 

3. Third, even if we have good longitudinal input and student 
output data, our understanding of the educational process 
will still be limited if we lack information on the college 
environment . (Fig. 3) Thus, it is one thing to know that 
your college overproduces or underproduces Ph.D.*s, but 
quite another to understand whv . What Is it, in oUier 
words, about the environment of a college that causes it to 
over- or under-produce? This last lesson suggests that 
input and output data, by themselves, are of limited 
usefulness. Rather, what we need in addition is 
information about the students* educational environment 
and experience: the courses, programs, facilitit s, faculty, 
and peer groups to which to which the student is exposed. 

Peiiiaps the need fot these three kinds of data can be better understood with an arialogy 
from t^'* physical science of astronomy. Having only output data would be like taking a single 
shapshot of a heavenly body, say the sun ot the moon or some planet. We might be able to 
determine its distance and size, but not much else. Having a series of snapshots of the same 
heavenly body would represent an improvement, because we could chart how it shaogss over time. 



This is analogous to having both input and output data. But liniiting ourselves only to input and 
output data would be like having a good description of the movement of the heavenly body without 
knowing what was happening to any of the other planets or satellites or stars in its immediate 
environment In other wwds, simply being able to document changes in the behavior of a group 
of students over a period of time is of limited value if you don't know what forces were acting on 
these students during ' same period of time. 

Perhaps an even better analogy can be found in the field of health care. Let's say we were 
trying to enhance our understanding of how best to treat patients in a hospital. Imagine how 
difficult it would be if all we did was to collect output information on how long each patient stayed, 
and whether they got better, got worse, or died. How could we expect to leam much about how 
best to caie for our patients if we don't know which patients got which therapies, which 
operations, or which medications? This is the equivalent of studying smdent development with no 
"environmental" data on what courses they took, where they lived, how much they smdied, and so 
on. But if all we do is to collect environmental and outcome (post-test) data, we won't even know 
which students-in terms of high school background, family background, major, interests, and so- 
-gaincd the most and which gained the least This is the equivalent of smdying patients' treatments 
and cventoal outcomes without knowing anything about their history or diagnosis at the point of 
entry. 

These early stodies present a number of challenges to a quantitatively-oriented researcher. 
How, for example, do you control for smdent inputs and how do you assess environmental effect? 
These are extremely complex and difficult technical issues that we don't have time to cover today. 
Another quantitative challenge is how best to measure student outputs: What are the important and 
relevant outputs and how do we assess them? (I will renim to this question shortly.) But perhaps 
the most difficult assessment question of all is how to measure the educational environment This 
is not only a COTiple;r and important problem, but it is clearly our most seriously neglected 
problenL Many of us in the assessment field have agonized a good deal about what student 



outcomes wc want to assess or the merits of pretesting and post-testing, but few of us have given 
much thought to the problem of assessing and documenting the students* educational experiences* 

At the Naticmal Merit Scholarship Corporation we were fortunate in having access to a 
lar^ group of highly talented and cooperative students who were willing to provide us with input 
and ou^ut data via mailed questionnaires and personality inventories. We had access to as many 
as 35,000 highly able new students eveiy ye v. It was during this time that I began to do some 
serious number crunching-first with IBM punched card equipment and subsequently with what 
later came to be called "first generaticm" compute. The old vacuum tube jobs. My colleague Bob 
Nichols-God bless him-virtually coerced me into letting him teach me the FORTRAN langua^- 
and before long we were writing some pretty fancy miltivariate programs for analyring laige data 
bases. Some of our data rans were bigger than they had ever had at the Nonhwestcm University 
computar center, and on a few occ44^.^ns we crunched so many numbers so fast tiiat we would 
actually fatigue me of the vacuum nibes to the point where it would quit woridng tenq)omily. 

But the Nati(xial Merit Finalists represented such a small fraction of the student population- 
-and such a highly biased fraction at that-that we soon decided to go for broke and to survey fill of 
the freshmen at a randcxn sanqple of 248 four-year colleges and universities. We crammed about 
20 items of input infmnation (Hito 5x8 cards which 'vere filled out by 127,000 freshmen in tiie 
fall of 196L This was designed basically as a survey of student input characteristics which we 
eventually hoped to follow up longitudinally. Much of the data, such as tiie students' choices of 
major fields and careers, had to be coded by hand, and all of it had to be key*puncLed We had so 
much data that in trying to read the input file we sometimes exceeded the reliability of die card 
reader or the tape drive. In tiiose days the computer systems' software was such tiiat if you had a 
read error from the input file, the whole job was aborted-so these errors were very costly. 

I have digressed a bit to talk about our massive data processing problems to make a point: 
Prom our early experience witii the I%.D. productivity proUem we came to realize several 
impOTtant tmths about multi-campus studies of college student developnicnt First, you need a 
laige number of instituticms in order to represent adequately die great diversity of college 



environments in the United States. Second, you need to measure a large number of variables, not 
only to reflect the many different types of student outcomes, but also to make sure diat you have 
controlled for most of the potentially biasing student input variables. Finally, you need large 
numbers of students in order to perfcxm sophisticated analyses of the many input, output, and 
environmental variables. None of this would have been possible, of course, without digital 
computers and, eventually, without optical scanners. 

Assessment Lessons froni JIRP 

In those days we felt we were really engaging in research that was on the cutting edge of 
higher education, and the scientific and technical journals were very receptive to our studies. What 
was somewhat frustrating, however, was that the higher education community didn't seem to be 
paying much attention to our findings, even though it seemed to us that what we were omiing up 
writh had a lot of relevance to what college faculty and adminisrators were supposed to be doing. It 
was primarily for Jiis reason that in 1965 1 jumped at the chance to move to the American Council 
on Education (ACE) in Washington to establish a research office. Since the Council is ^vhere the 
top administratis^ and policy makers in higher education congregate, I felt, on the one hand, that 
these leaders would benefit from a better knowledge of how their institutions were actually 
affecting dieir students, and that m, on the other hand, could benefit from their advice and counsd 
in conducting this research. 

These explanations eventually proved to be somewhat naive, as I will sh<xlly point out In 
any case, one of our first major activities at the Council was to set up the Cooperative Institutional 
Research Program (CIRP). As you might guess from our previous experience at the National 
Merit Scholarship Corporation, CIRP was conceived of as a tongitudinal study of how students are 
affected by dieir college environments. We started out in the fall of 1966 surveying die entering 
freshmen at a representative sample of 300 two- and four-year institutions and published our first 
national nomis <mi American college freshmen. (Fig. 4) An interesting sidelight to this is that we 
initially produced these norais merely as an incentive for institutions to participate. Our main goal, 
of course, was to conduct longitudinal studies of students by following up the entering freshmen, 

6 

er|c 10 



r 



but we were concerned that institutions would not be very interested in participating if they had to 
wait several years for feedback from the longitudinal follow-ups* The fireshman norms provided 
each institution with someUiing they could use right away* Of course, after 22 years these 
fiieshman surveys have acquired an identity and a life of their own; as a matter of fact, the freshman 
survey is probably the most visible thing we do now and many people are not even aware of our 
follow-ups and of the basic longitudinal design of CIRP* 

As we continued with our freshman surveys and the longitudinal follow-ups, it gradually 
began to dawn on us that virtually everything we were learning about assessment and research 
from our multi-institutional studies could be applied with equal validity to assessment activities at 
an individual institution. Let me briefly review some of these principles (Fig. S): 

L The need for multiple outcome measures . Qearly, no single institution's programs and 
impact on students can be adequately assessed with a single outcome measure. 

2. The need for input assessments . It is not very useful simply to know how students 
perform at the exit point In addition to providing a basis for measuring growth and change, 
student input data can enhance our understanding of the entering students' background, talents, 
aspirations, and educational needs. Such information can be extremely useful in program 
planning. 

3. The need for environmental data . Just as students who attend different institutions are 
often confronted with quite different types of educational programs, so are the students at a single 
institution often exposed to quite different kinds of educational experiences. Among the more 
obvious environmental differences are the majors students pick, the particular courses they take, 
and the particular professors or advisors who counsel them. But there are many other variations 
and environmental experiences that can make a substantial difference in how students actually 
dev-iop: (Fig. 6) where they live while attending college, how they study, how they use their 
time, how they support themselves, what kinds of organizations they join, what kinds of co- 
curricular activities they participate in, how much and what kinds of contact they have with faculty 
outside of class, and whether or not they participate in special educational activities such as honou 



o ' 11 

ERIC 



programs, developmental education, cooperative education, study abroad, and so on. These arc 
the things we can directly control on a campus, and any or all of them can make a difference in 
how students change or develop during the college years. Perhaps our main job as assessment 
specialists on the campus is to help our students and our colleagues gain a better understanding of 
how these various environmental factors affect different student outcomes. But note that, in order 
to do this, v/c need to areata a data file which incorporates all three kinds of information in a single 
place. (Fig. 7) Without such a data base there is simply no way that our outcomes assessments 
can be used to determine (a) why some students develop so differently than others; (b) what types 
of programs and experiences work best foe what types of students; and (c) which aspects of our 
institutional environment should be preserved and strengthened and which should be changed. I 
might also add here that the fmdings from our FIPSE-funded value-added consortium (which is 
making a presentation on Saturday morning) show clearly that the biggest single obstacle to 
implementing an effective program of outcomes assessment is the absence of a comprehensive 
student data base which incorporates input, output, and environmental data on individual students 
in the same place. 

During the early years of the CIRP we conducted several longitudinal follow-ups and 
publishd a number of books and articles on how students are affected by the type of college they 
attend. And even though this work was very satisfying and rewarding intellectually and 
professionally, something was missing. In retrospect it seems that the clinical psychologist or do- 
gooder part of my personality had been suffering from neglect. In order to give greater expression 
to this side of myself, I and my colleagues began to use CIRP data to explore a number of 
contemporary social problems and issues. Over the years we produced books on the 
disadvantaged student (H. Astin £t al» 1970), campus unrest (Astin £t fll» 1975) open admissions 
(Rossmannfiial. 1974), college dropouts (Astin, 1975), and ethnic minorities (Astin, 1982). We 
even did one for the Carnegie Commission on what we called The Invisible Colleges (Astin and 
Lee, 1971): the poorest, smallest colleges that no one seemed to care much about 



8 12 



A Talent-Development Model of Excellence 
In spite of all this activity, it gradually began to dawn on me that many of the educational 
leaders who participated in the Council*s activities had very little interest in our studies of student 
development ai.vi institutional impact. Rather, wjsu seemed to preoccupy the attentions of college 
and univ^ty heads was how to bring more resources into their institutions and how to preserve 
and enhance their institution's reputations. These impressions were what led me eventually to 
conclude that our traditional thinking about ''excellence** in higher education is dominated by two 
views: (Fig. 8): excellence as resources, and excellence as reputation (Astin. 1975). As some of 
you know, for several years now I have been very critical of these traditional views of excellence 
and have suggested an alternative s^proach which I call the **talent develq)ment** view. Under a 
talent development approach, excellence is not necessarily dependent on what resources or 
reputation you have, hut rather on how effectively vou educate vour students . I realize that some 
of our colleagues have been critical of the talent development value-added approach, and for the 
life of me I can*t see what all the fuss is about Without getting into any of the details of this 
debate, let mt just say this: To me, what education is really all about is Ifiamiog* ctUOgfi* gCSMth* 
aiid development This is all that is implied by the talent development concept How you decide to 
measure change is up to you. But if you don't think that education is fundamentally concerned 
with change and growth and development in students, then the talent development concept may 
indeed not suit your purposes. 

The debate over how we ought to define "excellence** is really a debate over values . Given 
that die resources and reputational approach lead us to compete for highr and higher positions in 
the institutional pecking order, the values underlying these trad ional views are inherentiy 
competitive, materialistic, and periiaps ^^'en narcissistic. The talent development approach, on the 
other hand, requires us to do everything we can to contribute to the intellectual and personal 
development of our students. Clearly, die implicit values und^lying this approach are more 
cooperative and altruistic than competitive and materialistic. 



9 

13 



What many of us fail to realize is that these same values have tremendous implications for 
how we approach assessment in higher education. I have lately come to believe that the values of 
an institution are reflected in the kinds of information it collects and pavs artenrion to. Underthe 
reputational and resource approaches* for example, we are primarily interested in input data on the 
SDlfiliDg student, since their high school CPAs and admissions test scores an*, viewed as a measure 
of our excellence. We are excellent merely by having bright students, not necessarily by educating 
them well. Under these traditional views we might also be inteifested m output data on the alumni: 
how much they earn, how many are listed in Who's Who, etcetera, as another index of our 
excellence. Ifour graduates are excellent, then certainly^ must be excellent Under a talent 
development view, however, neither input nor output data, by themselves, is very useful. Rather, 
we really need Jjfilh kinds of data, because what we seek to find out is how much students are 
actually learning and how they are developing over time. 

Assessment and Values 

But lately fve come to realize that these value questions go far beyond merely collecting 
input and output data and trying to understand how much and how well our students are learning. 
An equally critical value question concerns what aspects of student development we decide to 
assess, and jassiL 've choose to assess them. Whether we like it or not, Uicre is no '^ach thing as 
value-firee assessment Our very choice of nieasures-our decision to measure some things and not 
others-involves value judgments. 

Take, for example, the familiar dichotomy between cognitive and affective outcomes. 
Most of us are inclined to shy away £rom assessing affective outcomes because we think they are 
too value-laden. We feel much more comfortable limiting our assessments to cognitive outcomes. 
College, after all, is supposed to develop the student's intellect, so how can vv^e go wrong if we 
focus on cognitive variables? But wait a minute. If you read through a few college catalogs, you 
begin to realize that colleges claim to be concerned about such **affective" things as good judgment, 
citizenship, social responsibility, character, and the like. Indeed, most descriptions of the 
"liberally educated" person that IVe ever read sound at least as affective as they do cognitive. I 

ERIC J 4 



v^ant to return to the issue of affective outcomes shortly, but first let*s look a little more closely at 
the cognitive side. 

By far the toughest assessment challenge in the cognitive domain is to nieasure talent 
development in the area of general education* Basically what we try to do here is to assess those 
qualities that the G.E. program is supposed to foster and develop. 

For some institutions this might include basic comunication abilities such as skill in written 
composition. I have also seen institutions try to assess reading and even speaking skills, but IVe 
almost never heard of a program that includes skill in listsniog* (The pioneering program at St. 
Edwards University 'ji Ausm, Texas is a notable exception.) What are the implicit values involved 
in such choices? Why do we emphasize good writing and speaking so much more than good 
listening? What values are involved in such choices? 

Our basic purpose in assessing G.E. outcomes is, of course, to reflect the intent of the 
curriculum. Yet the G.E, cuniculum- whether it consists of core courses or distributional 
itquircmcnts-is itself a statement of values. If you doubt this, I simply refer you to the recent flap 
over Stanford University's revised G.E. curriculunt After much internal controversy and debate, 
the Stanford faculty decided to change the required reading list to include more material by and 
about women, minorities, and non-western societies. Education Secretary Bennett injected himself 
into this controversy by being strongly critical of Stanford's decision, accusing the Stanford 
faculty of selling out the great books tradition to transient political pressures. Regaixlless of whose 
views you endorse on this nrmrter, how can you possibly read about this controversy and continue 
to believe that aox cuniculum i s not heavily value-laden? 

Most of us seem comfortable in trying to asess the cognitive sSoM that our core curriculum 
is designed to foster-whether t iese be computational skills, verbal communication skills, 
comput'^ skills, or critical thinking skills. The reason why we usually do not question such 
decisions, I think, is that we tend to believe that focusing on what are in effect vocational skills is 
not a value issue. After all, who would argue about the usefulness of equiping students with skills 
that will enable them to perform better in their jobs? Are not these skills the same ones that are 

ERIC ^ 



most valued by tlic companies that employ most of our graduates? But what about the other 
aspects of the students' life: their family, their friends, their role as citizens, and their participation 
in the comunity? Is it not a statement of values to decide to focus on talents that are of intertsl to 
employers and to neglect talents that are relevant to the many other aspects of the students' life after 
college? 

The Multiple-Choice Tests 
Let's now consider how we assess cognitive outcomes. How value-free are our 
assessment msihods? Our old friend, the multiple-choice test, is still by far the most popular 
approach. Multiple-choice tests are popular for at least two reasons: They can be administered and 
scored very cheaply in large groups, and they naturally yield quantitative scores that make it easy to 
differentiate among students. Aside from the many technical criticisms that one can make against 
this multiple-choice metliodology, I have several concerns that are more value-oriented. 

First is the way multiple-choice tests are scored . Typically, the number of right 
answers (or a weighted combination of rights minus wrongs) is converted into some type of 
Dfinnsd score, cither d percentile or a standard score. Now what do we really do when we make 
such a conversion? We have lost the basic information about how many items (and which ones) 
the student got right and wrong, and replaced this information with a score indicating only how 
well the student performs in relarion to other students. By using tests that are scored normatively, 
we are basically putting students in competition with each other. So the in5)lied value here would 
seem to be that the cognitive performance of any given student should be judged only 
competitively: How much better or worse did the student do in when compared to other students? 
This competitive scoring procedure is essentially identical in spirit to traditional classroom grading, 
especiaUy if the grading is done "on the curve." I might add that these relativistic and 
competitively-scored tests are difficuh to use in assessing talent development, because tiiey make it 
virtually impossible to determine how much a student has actually changed or improved over time. 
All we can say is that the student's performance has increased or decreased in relation to other 
students. 

12 16 



There is another, perfiaps even moct subtle problem with normative assessment, whether it 
be through letter grades or standardized tests: When we choose to assess performance using a 
normed instrument, we create what the economists would call a "scarce good" Only so many 
students can be at the top of their class and only so many students can score above the 90th 
percentile. No matter how hard students work and no matter how much they actually leam, there 
will always be only so many "excellent" test scores and so many "excellent" grades. Normative 
assessment, in other words, automatically constrains how much "excellence" you can have. The 
important thing to realize is that this shortage is a completely artificial one, rather than something 
inherent in the trait being assessed. The shortage, in other words, is something created by the 
assessment method itself. 

As with any scarce good, the scarcity itself tends to exaggerate the importance of being at 
the top, so that below average or even average performance is often viewed as failure. Normative 
scoring, in other words, guarantees that a substantial number of students, if not the majority, will 
view themselves as failures. 

Before discussing the other limitations of multiple-choice tests, I would like to point out 
that there is something that we in the assessment field can do tt> overcome the negative 
consequences of most norm-referenced tests. Very simply, we can insist that the testing 
companies give us back the raw score results and, ideally, the results from individual test 
questions . Raw scores provide a way to measure how much each individual student is actually 
learning or improving over time, without requiring any competitive comparisons with other 
students. Furthermore, results from individual test questions can be useful to individual students 
in understanding their particular strengths and weaknesses. Item results aggregated across 
students can be very useful in curriculum planning and course evaluation. I feel srongly that all of 
us who utilize any type of standardized test in our assessment work should begin insisting that the 
testmakers give us this kind of feedback. 

My second concern about multiple-choice tests is the artificiality of the task itself. After 
students finish their formal education, the ability to find a correct answer fix)m a predetermined set 




of alternatives has a very limited usefulness. How often in real life is any of us presented with a 
prepackaged set of possible answers to a question, only one of which is correct? And how often 
are we required to read the question and find the answer under intense time pressure? How often 
do life's problems take such a bizarre form? And what about the myriad real-life problems that call 
for grcatiyg solutions? My point here is that the ability to perform well on such tasks is so highly 
specialized and so foreign to the kinds of real-life problems that we normally conftont that I really 
wonder if we educators have been wise to make such liberal and uncritical use ot the multiple- 
choice test. The testmakers might respond that such tests have "predictive validity," and indeed 
they do. But in such validity studies the outcome being predicted is almost always school or 
college grades or simply another test constructed in the same manner! 

This problem is perh^s best illustrated by the fact that the multiple-choice test is not well- 
suited to many important kinds of educational outcomes, but especially not to those that require the 
exercise of creati' -e talents. Creativity can be expressed basicaUy in two ways: through creative 
Ei2dU£l5, and through some sort of creative performance . (Fig. 9) Creative products include such 
things as essays, research papers, scripts, fihns, videos, works of art, and musical compositions. 
Creative performances include equally diverse activities such as public speaking, dance, musical 
recitals, and theater productions. Depending upon how broadly you choose to define "creativity," 
you might also include performance outcomes such as leadership behavior, public service, and 
athletics. Clearly, the multiple-choice test is an inappropriate technology for assessing many types 
of creative outcomes tiiat are highly valued not only in tiie academic community but also in later 
life. Thus, when we insist upon putting our principal assessment emphasis on student outcomes 
tiiat can be measured through multiple-choice tests, we are implicitiy assigning low priority to 
student creativity. 

My third value concern about normatively-scored multiple-choice tests is much more 
subtie and philosophical. It has to do wih tiie fact that whai we administer and score such a test, 
we arc implicitiy placing the student along some kind of narrow continuum togetiier witii other 
students. In a sense, we are forcing all the square pegs and rectangular pegs and otiier oddly- 

18 



shaped pegs inU) the same round hole. This process does considerable violence to the rich and 
marvelous diversity that chaxacterizes any group of students. To think that we can adequately 
capture this diversity through one or even several muldple-choice test scores is absurd. 

My final concem about multiple-choice tests is the distance that they put between the 
student and the professor. Not only is the adminisration and scoring done impersonally, but the 
numerical feedback is dry and impersonal as well. Clearly, such a procedure is inimical to the 
close student-faculty contact that much of the research in higher education shows to be so 
important 

Holistic Methods 

By this time you may wonder, "If Astin is so dubious about course grades and 
standardized multiple-choice tests of cognitive functioning, what would he put in their place?" 
Qearly» the student needs feedback of some kind, and the institution needs some way to document 
the student's progress. In my brief discussion of creative poformance, I have already suggested 
that the individual creative products and performances of students probably have to be assessed in 
some kind of holistic fashion. The more I think about this problem, the more convinced I become 
that holistic feedback, whether written or spoken, is far and away the most powerful assessment 
tool we have for enhancing the educational process. Specifically, I am thinking here of the kinds 
of written feedback that students receive at places like Hampshire College, Alvemo College, 
Empire State College,, and the University of California at Santa Cruz. If you ever had occasion to 
see such narrative feedback, especially from the individual student's perspective, you would 
appreciate its potential educational value. Not only is the feedback itself extremely informative and 
useful to the student, but since the process itself requires the professor to get to know the 
individual student's work personally, narrative feedback strengthens and enhances the relationship 
between student and faculty member. 

This discussion not intended to imply that normed multiple-choice tests have no place in 
the assessment of cognitive development; rather, I simply wanted to challenge the widely-held 
belief that such tests are always the best mediod for assessing cognitive outcomes. Like eveiy 




other method, they have their advantages and limitations, and if one of the purposes of assessment 
is to enhance the teaching-learning process and to bring about greater contact between faculty and 
students, then narrative feedback has much to offer. 

One problem with holistic written evaluations of student performance, of course, is that 
they do not necessarily yield numerical estimates of student performance to put in our data base and 
to use in our statistical analyses. This is a real problem for us number-crunchers. I would like to 
stress, however, that there is nothing inherent in the narrative or in any other qualitative assessment 
method process that precludes quantification, although not all institutions that use narrative 
feedback try to quantify their evaluations. One very simple approach would be to have the 
evaluator also complete a brief set of rating scales, with each scale representing a different skill or 
area of knowledge. Such an approach is not unlike the quantification involved in scoring essays or 
in judging musical and artistic compositions. 

Another, perfiaps more serious objection to holistic narrative evaluations is that they are 
highly labor intensive. If professors are going to be asked to undertake such evaluations of their 
students, what kinds of trade-offs are we going to make in terms of the professors' other job 
responsibilities? I believe that the best way to approach this question is first to recognize that we 
are once again dealing with a question of values . If we believe that students can benefit 
significantly from the experience of having a professor get to know their work well enough to 
write a detailed narrative evaluation of it, then what other, presumably less useful, activities can be 
traded off against the time required for the professor to carry out the evaluation and discuss it with 
the student? Each institution, of course, will have to answer this for itself, but it seems to me that 
one reasonable trade-off would be for professors to do a bit less lecturing or even to teach one less 
class. Not only would the students benefit from the personalized feedback, but the professors 
would probably welcome the variety introduced into their pedagogical activities. Certainly it 
behooves us to begin to study the potential efficacy of such tradeoffs. 



20 

16 



Assessing Affective Outcomes 
Lefs return now to the issue of affective outcomes. IVe already suggested that the 
importance of such outcomes is implied in the mission statements appearing in many college 
catalogs as well as in the notion of the liberally-educated person. What are some of these outcomes 
and how can they be assessed? In the area of affective skills we have a variety of potentially 
important qualities such as interpersonal competence, leadership ability, and empathy. The ability 
to enq)athize with others is, incidentally, probably dependent to some extent on one of the 
communications skills listed earlier listening ability. 

There are a number of other affective outcomes that seem to be relevant to the goals of a 
liberal education, although we do not ordinarily consider them to be "skills." These would include 
self-knowledge and self-understanding, maturity, motivation for further learning, understanding of 
other peoples and societies, self-esteem, social responsibility, and even good mental and physical 
health. 

One affective area that needs mart attention in our assessment activities is the student's 
own values. As some of you know, I have been involved in monitoring the values of incoming 
fteshmen through the CIRP surveys for some 22 years now, and what I see happening troubles 
me. During the past two decades, students have become markedly more materialistic and more 
concerned with having power and status. (Fig. 10) They are increasingly coming to see an 
undeigraduatc education primarily as a means to make more money and less as a place to get a 
general education. (Fig. 11) At the same time» students have become less concerned about the well- 
being of others, the environment, and the community. (Fig. 12) These value changes have been 
accompanied by similar changes in the students* career plans, with careers in business reaching all- 
time highs in popularity, and careers in the human service occupations reaching all-time lows. 

Lately the higher education community has begun to counteract some of these trends by 
creating a number of programs that are designed to encourage student participation in public and 
comunity service activities. The Campus Conpact project, for example, is a consortium of some 
130 institutions that are working together to establish conununity service programs for 

17 21 



undergraduates under the sponsorship of ECS. In my own state of California the legislature has 
passed a law requiring each campus of the university and the state university to establish some kind 
of volunteer for public s^ce program for undergraduates. 

In last year's report to the Board of Overseers, Harvard president Derek Bok said that 
"universities shouU be among the first to reaffirm the importance of basic values such as honesty, 
promise keeping, free expression, and nonviolence...[and] there is nothing odd or 
inappropriate»».to make these values the foundation for a serious program to help students develop 
a strong set of moral standards." Bok also notes that "students must get help from their 
universities in developing moral standards or they are unlikely to get much assistance at all." 

These trends would suggest that our political and higher education leaders are implicitly 
suggesting chat social responsibility and concern for others is one of the qualities that higher 
education institutions should try to foster in their students. Certainly it would seem appropriate for 
those of us in the business of trying to assess student outcomes to join forces. Let's begin to 
introduce longitudinal measures of qualities such as empathy, concern for others, tolerance, social 
responsibility, and the like into our assessment programs. 

Let me get just a bit njorc personal about this. As I look around me everywhere I see the 
great achievements of the intellect: atomic energy, genetic engineering, modem agriculture, 
modem medicine, and computers and other electronic marvels of every conceivable type. It is tmly 
astounding. And at the same time I see the great affective and emotional and spiritual divisions that 
threaten our very existence: religious fanaticism and hatred, racial prejudice, nationalism and 
political divisions, widespread criminal behavior in tiie land of opportunity, and massive poverty 
and starvation in the face cf unprecedented affluence. What this tells me is that is it time to rediess 
the balance. It is time to begin shifting some of our educational interest and energy in the direction 
of our affective side - to begin concerning ourselves much more direcUy with the development of 
beliefs and values that are going to heal our divisions, and which will help to create a society that is 
less materialistic and competitive and selfish and more generous and cooperative. 



18 22 



Beyond Narcissism 
Let me close by rerandng once again to our different conception of excellence . In the long 
run, the kinds of assessments that we perform and how we use the results will depend heavily on 
our ^luss, which means in part what conception of excellence we ultimately choose. During my 
days as a clinical psychologist I would have looked at the reputational and resource views as 
basically egocentric, in the sense that they emphasize what possessions we have and what others 
think of us. (Periiaps "narcissistic" would be even more accurate than "egocentric") Under a 
talent development view, we identify ourselves instead in terms of what we do, what we contribute 
to others and what they in turn contribute to their communities and to the society. In other words, 
by adopting a talent devdopment perspective, we, in effect, transcend out institutional egos to 
some extent and begin to view our institutions more in terms of their impact on the larger society. 

The basic point to be made here is a rather simple one: When an institution exists primarily 
for its own sake, and when it identifies itself primarily in terms of its resources and reputation, its 
relationship with the society it is supposed to serve becomes exploitative and defensive, and its 
capacitv to serve as an instrument for improving the societv is compromised . In short, the biggest 
obstacle to higher education's serving as a msyor instrument for societal improvement is the 
institutional ego. In a sense, our colleges and universities need to learn how to transcend their 
institutional egos and to become more actively involved in what is going on in this society. 

The notion of transcendence is, of course, a frequent theme in psychological theories. 
When I was doing psychotherapy, I was continually made aware of the fact that my ability to help 
a troubled patient was limited to the extent that I allowed my ego to become a prominent part of the 
therapeutic process. Egotistical therapists are less helpful to patients because their sense of worth 
comes primarily firom the power they wield over the patienu and from the patient's sense of 
helplessness and dependency. The most effective therapists, on the other hand, are those who are 
able to transcend their egos to the extent necsessary to emphathize fiilly with the patient and to 
create an accepting and supportive tlierapeutic clinical environment The therapeutic focus, in othei* 
words, should be on the patioit rather than the therapist. 

o 19 



I believe that there is a clear analogy here between patients and therapists, on the one hand, 
and students and institutions, on the other The capacity for higher education to be a positive 
change aacnt in American societv will depend upon our ability to transcend our insti tutional e^os, 
PUT narcissism, and our self-interest, and to concern ourselves more di rectly with the impact we are 
having on our students and comiiniti^s 

I believe that the key to achieving this kind of institutional transcendence is in how we 
ultimately defme our own excellence. Ratiier than continuing to see our excellence as limited to 
what we have (resources) or to our SQIUS and pissiifiS (reputation), we need to see it more in terms 
of what we dft and what we accomplir h. And to do tiiis we need to rely heavily on assessment 
We need to know many tilings: How much and how well do our students leam? How are we 
affecting dieir values 9nd attitudes? What kinds of citizens and what kinds of parents and spouses 
do tiiey make? Are tiiey becoming humane, more honest, and more concerned witii die welfare of 
others? Are tiiey becoming more active and better-informed participants in tiie democratic prx)cess? 
If we who are involved in tiie assessment game could succeed in persuading our faculty and 
administrators simply to ]jsgm seeking answers Vj such questions, we would be taking a major 
step toward institutional transcendence. 



24 

20 



ASSESSMENT AND IN CENTIVES; 
THE MEDIUM IS THE MESSAGE 



An address l^: 

Linda Darling-Hamniond 
The Rand Corporation 



Presented at: 

The AAHE Assessment Forum 
Third National Conference on Assessment 
in Higher Education 
June 8-11, 1988 
Chicago, Illinois 



2o 



Assessment and Incentives: 
The Medium Is the Message 
Linda Darling-Hammond 



I come to the question of assessment as a teacher — as a 
former secondary school teacher, as a faculty member In an 
undergraduate Institution, and a teacher In a graduate school of 
education. I see the Issue of assessment as most Important In 
Its Impact on the practice of teaching. 

There are lany other reasons vhy ve conduct assessment, of 
course — reasons that this conference has attracted a thousand of 
you this year, a tremendous Increase since a couple of years ago, 
when this Issue vas not very prominent on the higher education 
horizon. I suppose one of the reasons so many of you are here Is 
suggested by the comment made by Mark Twain many years ago: ''The 
legislature Is In session, and no man Is safe.** The stimulus from 
statehouses and boardrooms to create assessment systems In 
colleges and universities Is first and foremost a challenge to 
reframe the Incentives that govern academic life. That Is both 
its power and potential danger. 

As you know. If there's one thing social science research 
has found consistently and unambiguously (and there are few 
things on which we social scientists agree). It's that people 
vlll do more of whatever they are evaluated on doing. What Is 
measured will Increase, and what Is not measured will decrease. 
That's why assessment Is such a powerful activity. It can not 
only measure, but change reality. One reason Is that when 
assessment comes with stakes attached to It — stakes such as 
accreditation, student continuation In an Institution, or faculty 
evaluation — It changes behavior and creates trade-offs. 



HOW MEASUREMENT CHANGES BEHAVIOR 

We can see thlj In other aspects of social life. Recently 
the FAA required airlines to begin to report how often they 
arrive on time at their destinations. The trade-off that poses^ 
one acknowledged by that regulatory agency. Is a trade-off with 
safety concerns. The question becomes how often, because they 
are evaluated on tlme-of-arrlval, airlines will fall to make 
repairs that would make them run late. That potential problem 
has been partially taken Into account In that particular 
assessment structure. But the point Is that where you find an 
assessment measure, you will find a trade-off that has to be 
either taken Into account or sacrificed to maximize a particular 
objectlve—ln this case timeliness. 

In American business, we hav^ seen quite clearly the results 
of an emphasis on measuring short-term sales and profits, as 
against longer-range viability and market share. In fact, many 
analysts believe that American automobile manufacturers have lost 
market share to manufacturers In other countries because the 
measures that drove their operations, on which they gauged their 
performance, were not the long-run measures that would make them 
viable, productive Industries over the long haul. Much of the 
change In business procedures In this country Is an attempt to 
correct for what Is now viewed as an overemphasis on 
Inappropriate measures of success over the previous couple of 
decades . 

In medicine, we have recently seen the publication of 
hospital mortality rates: the chances you have of dying if you go 
into a particular institution. Well, very quickly, we have begun 
to realize that that particular measure will produce incentives 
for hospitals not to care for very sick people. So, if we do not 
find ways to measure quality of care more appropriately, we will 
find institutions beginning to turn very sick people away because 
it Will affect their mortality-rate measures. 

The point is, in any arena, what you choose to measure and 
how you choose to use those measures will affect the very 
functioning of the institution. That is precisely the problem 
that you are engaging here today and will be grappling with in 
your institutions over the coming years. 

I entitled this talk, ••The Medium Is the Message" in 
remembrance of Marshall McLuhan*s point that aside from the 
content of any communication, its form carries a meaning and 
effect all its own. So it is with educational assessment. What 
and how we look at what ve do will, in itself, affect what we do 
by creating a set of incentives and disincentives for how we do 
it in the future. That can be powerfully positive. In some 
instances it can be powerfully negative. 

For more than a decade, i^^tate policy makers have been 
reforming American elementary ani secondary education, seeking, 

2 



27 



especially, to make it more accountable. During a period when, 
for the most part (until the last couple of years), they were 
less willing or able to Invest resources In the enterprise, 
assessment — especially student testing — has been a major part of 
that activity. And as the uses of these assessment data have 
grown, they have profoundly changed the shape ard content of 
elementary and secondary education. 

I think there are two implications of that activity for what 
is occurring in higher education today. One is that the forms of 
assessment that have been used for some years now in elementary 
education have, in part^ produced the kinds of students who now 
enter your institutions. Students about whom state policy makers 
ring their hands. Students who, because they've been trained to 
take multiple-choice tests on a yearly basis, have stopped 
v;riting in their classes, have stopped reading books, have 
stopped engaging in complex problem solving or time-consuming 
laboratory tasks, and have come to college unprepared to engage 
in those activities. In fact, some of the events that have led 
to a concern for higher education accountability have been 
promulgated by the reforms of elementary and secondary education 
that havf^ had a less than happy history. At the same time, the 
fact of reform at the K-12 level has produced a desire to reform 
higher education as well. If this new wave of reform takes the 
same course that the reforms of elementary and secondary 
education have taken, we may have lost the entire battle. 

At the moment, the higher education enterprise in this 
country is well thought of the world over. Other countries pay 
to send their best and brightest to American colleges and 
universities to study and return home to increase the human 
intellectual capital in those countries. In many cases, those 
same countries consider the secondary and elementary systems in 
this country sub-standard compared to their own. What an irony. 
One of them, a gentleman from Britain, said to me, **Why don't you 
run elementary and secondary schools the way you run your 
colleges and universities, giving them a certain amount of 
control over how they conduct their work?" In fact, we're moving 
In the opposite direction: The policies that affect higher 
education increasingly resemble those that have already 
structured elementary and secondary education in this country. 
And, as I'll describe in a little while, some of those policies 
at which other countri'^s turn up their noses have a great deal to 
do with assessment. 

Now state policy makers have discovered higher education, 
and they're naturally extending their pet ideas to this sphere. 
Like assessment. In some cases they challenge you to assess 
yourselves. In others, they tell you how it should be done. In 
higher education, as in the K-12 arena, student testing is a key 
feature of many assessment plans. Though the same road is not 
yet taken, the issues are similar and the outcomes potentially 
the same . 

So, what I'd like to discuss this afternoon is the 
difference between assessment as an improvement vehicle and 

3 



2S 



testing as an accountability tool. I'd like to plc.ce what is 
occurring in American higher education today in a context of 
assessment more generally, both here and in other countries. 

INCENTIVES: A PARABLE WITH LESSONS 

The importance of assessment can be demonstrated with a 
little story. The events I'm about to recount took place in 
Wonderland. And they happened something like this: 

Once upon a time in Wonderland, a prestigious national 
commission declared th*»*- the state of health care in that 
country was abominabP fhere were so many unhealthy people 
walking around that tne commission declared the nation "at 
risk." Sweeping reforms were called for. in response, the 
major hospitals decided to institute performance measures of 
patient outcomes, tying decisions about patient treatment 
and dismissals to those measures. 

The most videly used instrument for assessing health in 
Wonderland was a simple tool that produced a single score 
with proven reliability. That instrument, called the 
thermometer, had the added advantage of being easy to 
administer and record. No one had to spend a great deal of 
time trying to decipher doctors' illegible handwriting or 
soliciting their subjective opinions about patient health. 
When doctors discovered that they would be held accountable 
for how many of their patients had abnormal thermometer 
scores, some complained that this was not a comprehensive 
measure of health. Their complaints were dismissed as 
defensive and self-serving. The administrators, to ensure 
that their efforts would not be subverted by recalcitrant 
doctors, then specified that subjective assessments of 
patient health would not be used in making decisions. 
Furthermore, any medicines or treatment tools that were not 
known directly to influence thermometer scores would no 
longer be purchased. 

After a year of operating under this new system, more 
patients were dismissed from the hospital with temperatures 
at or below normal. Prescriptions of aspirin had 
skyrocketed. And the use of other treatments had 
substantially declined. Many doctors had also left the 
hospital, arguing obtusely that their obligation to patients 
required them to pay more attention to other things than 
scores on the thermometer. Since the thermometer scores 
were the only measure that could be used to ascertain 
patient health, there was no way to argue if they were right 
or wrong. 

Some years later, during the centennial-year census in 



29 



Wonderland, the census-takers discovered that the population 
had declined dramatically, and mortality rates had 
Increased. As people In Wonderland were vont to do, they 
shook their heads and said, "Curlouser and curlouser." And 
they appointed another commission • 

That little piece o£ history Is being repeated In a number o£ 
different ways In American education. And It Illustrates hov It 
is that the choice of performance measures Is extremely Important 
when these measures become sanctions. Incentives, and decision- 
making tools. In terms of assessment In higher education, 
there's a vide range of experience from across the states. There 
have been a number of positive effects already: a rebirth of 
accreditation and renewed attention to what It can mean, a 
renewed Interest In comprehensive exams for students, surveys of 
graduates to find out what they're doing and what they have 
achieved since leaving college. All of those things can be very 
Inf orviatlve. 

At the same time, some studies of state-mandated assessment 
programs, and particularly a recent one by Peter Ewell and C^^rol 
Boyer, have pointed out trends that make clear the Importance of 
understanding the experiences of elementary and secondary 
education. 

First, they found that state policy makers search for 
models. Because they have been through this with elementary and 
secondary education over the last 10 or 15 years, they often use 
the K-12 tesstlng experience as a guide for what ought to be done 
In colleges and universities. 

Second^ assessment mandates, although they can be subject to 
a range of interpretations as to what form5 of assessment are 
required, are often Interpreted, either by the Institutions or 
the regulators, as requiring common, standardized achievement 
testing — although a close look at the language of the mandate may 
not uphold that Interpretation. 

Third, what Boyer and Ewell call "th^ oress to test" often 
absorbs the lion's share of assessment c ntlon and resources, 
th'^ugh tests may be one of only eight or wen different criteria 
or bases for assessment. 

Fourth, assessment can prod Introspection, evaluation, and 
Improvement. Depending on how It's structured. It can also 
produce excessive paperwork, a narrowed curriculum, and a focus 
on ways to boost scores at the expense of real learning. 
Indeed, a recent study of the course of Tennessee's performance- 
funding Initiative observed that, although a wide range of 
Indicators were to be used and available, policy makers 
Increasingly want to focus primarily on comparative test scores 
across Institutions. This produces a range of Incentives that 
can work against both educational Improvement and equality of 
educational opportunity. 

A lot depends, of course, on the forms of assessment, the 
uses made of assessment results, and the stakes that are 
associated with assessment. Over the last twelve or fifteen 



years In elementary and secondary education, assessment has 
become, first, narrower In form and content vlth each successive 
state mandate; second, used for more purposes than those for 
which It was originally designed; and, third, tied In Its results 
to greater and greater stakes vlth each successive refinement of 
state policy. 

The same Is likely to occur In higher education unless a nev 
view of assessment Is born and spread throughout this country. 

THE K-12 EXPERIENCE; HIGH-STAKES ASSESSMENT 

It's Instructive to take a look at a brief history of what 
has happened In your sister Institutions at the K-12 level. In 
the early 1970 's, accountability legislation was passed In a 
great number of states, and It looked very much like the 
legislation that you're encountering today. It required planning 
and assessment systems; It was process-oriented. The 
Institutions had to pledge to undergo some sort of Internal 
evaluation and assessment. 

Five or six years later, by the late 1970's, those methods 
had proved to be Insufficient for policy makers* goals: Minimum 
competency tests were Instituted. First, they were used as the 
criteria for graduating seniors. Later, the mandates were 
extended throughout the grades — vlth testing In grades 3, 5, 7, 
3, and 12 In many states. A requirement that they be natlonally- 
normed standardized tests became widespread. By then, the tests 
were to be used not only for graduation but for promotion, 
tracking, and sometimes (but not always) remediation of students. 
None of this provided what policy makers considered sufficient 
accountability. 

And so. In the mld-1980's the states began to mandate 
curriculum guidelines that were heavily specified and aligned 
with the test. So, the current rage Is curriculum alignment. 
The tests themselves didn't produce the outcomes desired, so now 
curricula are being redefined and mandated to match the test. 

Part of the reason for this trend Is the Increased use of 
test results and their higher stakes. For example, standardized 
tests are now used as the basis for decisions about even such 
things as students' graduation from kindergarten In states like 
Georgia. They're used for decisions about remediation, tracking, 
placement In gifted and talented programs, and a variety of other 
things — things that these particular tests were not created or 
designed to do. in some states, they're now used as the basis 
for decisions about school funding. (We see that In the higher 
education realm In a few places.) Mandated tests are now used In 
elementary and secondary education as the basis for evaluating 
teachers for promotion, for career-ladder status, for salary, 
even for tenure and retention. They're used In some places for 
evaluating administrators. 

We see In this brief account of the K-12 experience that 

6 



31 



three things tend to happen when assessment measures are 
introduced. First, once a measure gains currency, policy makers 
have a tendency to use It for purposes and decisions not 
originally Intended, as they arise, and regardless of whether the 
measure Is Intended for or suited to that purpose. Second, 
quantitative, comparative data, tend, over time, to override or 
overwhelm other forms of Information, especially for people who 
are not expert In that area or enterprise. The tendency Is to 
turn quickly and unskeptlcally to numbers, because, although we 
can't decipher all the other Information about the quality of 
your physics programs, we all know that a 42 Is better than a 38. 
Third, there's a tendency to forget what the number represents — 
If that was known to begin with. Robert Sternberg, a noted 
testing expert at Yale, makes the point that ''the appearance of 
precision Is no substitute for the fact of validity.** Try 
telling that to a legislator who doesn't care to examine the 
dlsjuncture between the goals of your International studies 
program and the material on the ACT-COMP. 

Once a measure Is used to make high-stakes decisions about 
students, faculty, programs, or Institutions, It can take on a 
life of Its own. Walt Haney and George Madaus at the Center for 
Evaluation at Boston College make several points about high- 
stakes assessments, whether they are tests or other kinds of 
assessment. 

First, the more any social Indicator Is used for social 
decision making, the more likely It will be to distort the social 
processes It Is Intended to monitor — In this case, an educational 
Institution. Basically, any measurement of the status of an 
educational Institution, no matter how well contrived. Inevitably 
changes Its status as people try to secure more of whatever Is 
being measured. 

Second, If Important decisions are presumed to be related to 
test results, then teachers will teach to the test. This Is easy 
enough to understand. High-stakes tests. It Is argued, can, on 
the positive side, focus Instruction, giving students and 
teachers specific goals to attain. Unfortunately, because such 
tests are Indirect measures of the actual learning we care about. 
It's possible to do all kinds of things, quite successfully, to 
raise the test scores without actually Increasing the amount of 
learning taking place. In fact, much recent research on the 
effects of high-stakes testing has shown that as scores on the 
Instrument being used for assessment Increase, scores on other 
measures tend to decline because of the shift In emphasis to that 
which Is being measured. This has been studied and found to be 
so across tests, across settings, and In a number of countries. 
Studies of the effect of examinations on learning In Australia, 
India, Japan, Ireland, and England all turn up the same kind of 
result: that teaching to the test correlates with a de-emphasls 
on other forms of learning. So, we have to be very sure that 
what we're testing for Is what we want. In fact, to promote, and 
that we don't. In fact, value other things as much or more than 
what we're measuring. 

7 

32 



Third, teachers pay particular attention to the form of the 
questions on a high-stakes test—multiple choice, short answer, 
essay — and they adjust their instruction accordingly. The 
problem here is that the form of the test question can narrow 
instruction to the mode of the test as well as to its content. 
Haney and Madaus give this example from the Georgia Regents* 
testing program, which is designed to assess competencies of 
college students in reading and writing. The head of one English 
department in that state lamented: 

Because we are now devoting our best efforts to getting a 
larger number of students to pass the essay exam, we are 
teaching to the exam, with an entire course given over to 
developing one type of issay: a five-paragraph argumentative 
essay written under a time limit on a topic about which the 
author may or may not have any knowledge, ideas or personal 
opinions. Teaching this one useful writing skill has the 
effect of bringing a large number of weak students to a 
minimal level of literacy. But, at the same time, it 
devastates the larger purposes of the composition program, 
which should also be challenging students to produce a 
variety of different kinds of writing at more than a minimum 
level of competency. Because the Regents' Test is 
primarily designed to establish a minimal level of literacy, 
our teaching to the test, which its importance forces us to 
do, tends to make the minimal acceptable competency the goal 
of our institution — a circumstance that guarantees 
mediocrity. 

In fact, however, Georgia is luckier than most. At least they 
are using an essay exam. In many places, and in K-12 education 
generally, multiple-choice tests mean that the incentive 
structure produces virtually no writing at all in classrooms. In 
fact, in most places now, particularly in large cities across 
this country, reading instruction has come to resemble the 
practice of taking a reading test, in reading class, students 
use commercial materials to decode short paragraphs about which 
they then answer multiple-choice questions. The teaching 
materials have evolved to resemble the tests the students will 
take. And tests dictate both the content of instructLon and the 
teaching methods. 

The fourth effect of high-stakes assessment is that, when 
test results are the scle or even the partial arbitrator of 
future life choices, society tends to treat them as the major 
goal of schooling rather than a useful indicator of achievement. 
So, ultimately, a student's score on the ACT-COMP, for example, 
may mean more to society than what she knows, more than anything 
else she has produced or done in the institution that she 
graduated from. 

Finally, a high-stakes test transfers control over the 

8 



curriculum to the agency which sets or controls the examination. 
In this sense# obtaining control over the choice, content, and 
substance of whatever those measures are is the most important 
thing any institution can do to maintain control of its destiny. 
The phenomenon, as Haney and Madaus point out, is well understood 
in Europe, where systems of external certification exams 
controlled by central governments or independent examination 
boards operate at the secondary level. In this country, 
authority is increasingly being assumed for such decisions by 
state education agencies. And further, since states often use or 
mandate tests developed by outside test development companies, it 
is important to realize that the state may effectively be 
delegating this very real power to a commercial company wnose 
interest is primarily financial and secondarily educational. 

LIMITS OF STANDARDIZED TESTING 

These five effects of high-stakes testing place a great 
responsibility on those who choose the raeasures. The coming of 
standardized achievement testing to higher education in America 
is a particular problem given the nature of testing in this 
country, which is very, very different from testing in most other 
countries. The United States invented McDonald's, and we 
invented multiple-choice, norm-referenced, standardized tests. 
We tend toward the quick, easy, and convenient, if less 
nourishing, approach to getting things done. But the kind of 
tests that are widely used in this country are rarely used 
abroad. And they are poor measures of most things that colleges 
are supposed to get students to do. In fact, if you think about 
it, the idea of a standardized, multiple-choice test of higher 
education is a stunning idea. What could such a thing be? Maybe 
a collegiate-level Trivial Pursuit, sampling those facts that a 
college educated person ought to know at the end of four years? 
Or maybe something that focuses on process... A set of general 
abilities that college is supposed to encourage? That means, in 
the parlance of psychometr icians, that it probably measures the 
**G factor,** that combination of general intelligence and test- 
taking skills which allows SAT, NTE, and ORE scores to correlate 
at a level of .9, irrespective of content and in virtually any 
sample in which you try to make that correlation. That doesn't 
tell you much about what a particular college or university 
produces in students, beyond what you could predict from knowing 
their SAT scores when they came in. 

It's striking to think about liuch tests in contrast to other 
traditional forms of higher-education assessment: comprehensive 
examinations, theses, oral exams (which are still popular in some 
places, although less so than earlier), the use of outside 
examiners, demonstrations, exhibitions that are performance 
or iented. . .methods that in fact tell you whether somebody can do 
what they've supposedly been taught to do. 

Indeed, many argue that it is the excessive use of narrowly 
defined, standardized testing in K-12 education that has produced 
some of the problems that colleges are now expected to solve: 

9 

34 



students who come to you unable to write, think critically, solve 
problems, design and complete projects. The kinds of tests more 
frequently mandated require passive responses; they measure 
recall and recognition of facts, they aren't performance- 
oriented. And they can't represent either vast domains of 
knowledge well or higher-order thinking skills. 

In contrast. In Europe and many parts of Asia, testing does 
not usually Involve machlne-gradable answer sheets filled out 
with number-two pencils. In England, students prepare written 
examinations In each of their areas of specialty. They also 
submit material from their portfolios of coursework. At the end 
of the secondary level In France, students take written and oral 
examinations In five areas. One of those, required of all 
students. Is philosophy. Here are some sample questions from a 
recent exam: "What Is judgment?" "Why should we defend the 
weak?" Compare that to what we might see In multiple-choice 
assessments In this country. In Germany teacher recommendations 
and grades, written examinations in German, foreign language, 
math, science, and social studies, and an oral exam are the basis 
for assessment. In Russia you might encounter an essay 
examination in Russian literature, and a series of oral 
examinations with two examiners in a variety of other areas. 

And in many of these countries you would encounter, as well, 
a different notion of what assessment means to the structure of 
the educational profession. Faculties convene to develop the 
assessments. Furthermore, it is both the privilege and 
obligation of professors to examine the students of other 
professors at other institutions. The outside-examlner tradition 
in those places means that the act of assessment Improves 
knowledge and cross-pollination across the enterprise as a whole, 
among professional faculty and the students. It is not the 
primary purpose of such assessment to produce and report two- 
digit data points to some ccher authority. 

Recently, representatives from a group of countries came 
together to talk about international assessment and indicators. 
The United States made a proposal that the other countries adopt 
our National Assessment of Educational Progress (or NAEP) as the 
outcome measure. Not a single other country could be persuaded 
that doing so would in any way enhance the quality of their 
educational systems, in fact, the view most frequently heard was 
that assessment was too important to allow the adoption of a 
black-box measure that could drive instruction in directions we 
do not want to see it go. 

It's time for educators in this country to begin to think 
about assessment as sufficiently important to heed such warnings. 
We must insist on creating and retaining control over measures of 
what we think a liberally educated person in this society needs 
to be able to know and do. We must not abdicate any of that 
responsibility to a measure that we don't thoroughly endorse. 

10 



There is a revolution In testing and assessment beginning to 
occur In this country. The kinds of concerns that I've been 
citing, primarily £rom the elementary and secondary domain, have 
begun to spur some action. ETS, for example, realizes that the 
days of the kind of test It has traditionally marketed are 
limited, and It's beginning to develop other forms of assessment. 
ETS president Greg Anrlg has said, "the tests of the future will 
be Individualized, open-response explorations of what students 
are learning over time." A new version of the National Teacher 
Examinations Is already being devised for some of the same 
reasons. We need assessments which reflect performance. As I've 
mentioned, that's already the case In many other countries, where 
faculty are the ones developing the tests. They don't have to 
wait until a test development firm decides to do It • I think 
that Is going to have to begin to happen here as well. 

EFFECTS OF TESTING ON TEACHING AND LEARNING 

We need better assessment because our tendency to treat 
tests as black boxes has become an Increasingly serious problem. 
One example of hew we're missing the mark Is Illustrated by the 
NAEP results. In which we have seen over the decade that 
students' abilities to understand basic concepts and principles, 
to analyze and make Inferences, and to do problem solving have 
declined. The recent NAEP findings on reading achievement were 
accompanied by this commentary: "Only 5 to 10% of students can 
move beyond Initial readings of a text. Most seem genuinely 
puzzled at requests to explain or defend their points of view." 
"Current methods," the NAEP assessors say, "of teaching and 
testing reading require short responses and lowers-level cognitive 
thinking, resulting In an emphasis on shallow and superficial 
opinions at the expense of reasoned and disciplined thought." 
And given what most reading tests measure. It Is not surprising 
that students haven't been taught more comprehensive thinking and 
analytic skills. And so It goes. In science. In mathematics. In 
history. In writing, and throughout the various subject areas. 

Surveys have shown us that during the time state policy 
makers began to Institute test-oriented accountability measures 
(between 1972 and 1980), the use of teaching methods appropriate 
to the teaching of higher-order skills declined In American 
public schools. There was a decline In methods such as student- 
centered discussions, writing essays or themes, and project or 
laboratory work. In 1980, fewer than 2/3 of high school students 
wrote regularly In any of their classes. 

The National Science Foundation, The National Assessment of 
Educational Progress, and The National Councils of Teachers of 
English and Mathematics have all attributed this decline In 
students' problem-solving abilities to basic-skills testing In 
American schools. They charge that the emphasis on teaching what 
Is tested In multiple-choice, standardized achievement tests has 
resulted In the neglect of higher-order thinking skills and 
performance abilities. There has been a de-emphasls on subjects 
that are not tested and on modes of performance other than 

11 



ERLC 



38 



ERIC 



multiple choice, short answer, and f Ul-ln-the-blank responses to 
predefined questions. 

A number of recent studies uncover similar effects In 
colleges of education where standardized tests for graduation and 
licensure have been Introduced, and where. In some cases, the 
pass rates of students on those tests have had Implications for 
program approval. It Is Imperative tnat, as colleges and 
universities begin assessment activities, the difficulties that 
have undermined the health of the elementary and secondary system 
not be allowed to repeat themselves. 

In fact, my hope Is that more productive forms of assessment 
that are being developed already In many of your Institutions 
filter down to the elementary and secondary schools. The caliber 
of those students who come to you from those schools will, after 
all, be In large measure a function of the caliber of assessment 
that goes on there. In one recent Issue of Education Week ^ there 
were three headlines In the course of about as many pages that 
give an Indication of what's happening to school curricula. One 
headline said **Uslng real books to teach reading said to heighten 
skill and Interest. Why Is this news? Well, as the article 
goes on to say, "the emphasis Is on test-taking skills, not on 
learning reading or writing as a creative process." Another 
headline reads: "Best writing Instruction uses all classroom 
resources and engages students In writing." In fact, a review of 
72 studies shows that people learn how to write by writing! And 
not by being drilled on the rules of grammar. A third headline 
reports, "Study finds a neglect of humanities." The article 
Indicates that half of the state officials surveyed attributed 
this neglect to the back-to-basics movement's reinforcement of 
teaching to standardized tests. 

Ultimately, If we give the message that test scores are more 
Important than learning, we'll see Increases In test scores, but 
not necessarily In learning. Indeed, Lake Wobegon, where all the 
women are strong and all the children are above average, has 
already come to pass. A recent report released by the Friends of 
Education documented what some of us have been observing for some 
time: Every state In the nation now reports that Its test scores 
are above the norm. 

There are more Important problems, though, when we think of 
using assessment. In the form of testing, as a public-policy 
tool. And I've seen this In some of the recently-enacted state 
policies to which some of you are subject. 

POLICY MAKING AND ASSESSMENT 

First, some policies define the Institution as the unit of 
analysis, and then call lor measuring "Improvement" with some 
kind of average test ?core from year to year for that 
Institution. But the Institution is the wrong unit of analysis. 
It Is meaningless to report that a school has, for instance, 

12 

37 



improved Its score. Schools do not take tests; students do. And 
students move Into and out of Institutions, carrying their test 
scores with them. You cannot Infer Institutional progress over 
time by looking at the average scores of students at one point In 
time and then the average scores o£ a different group of students 
at another point in time. To draw conclusions from such 
''snapshots** can lead to ridiculous conclusions about school 
"effects" — like that of the demographer vho remarked about the 
amazing effect of living in Miami, where everyone is born Cuban 
and dies Jewish* 

Furthermore, such invalid methods of comparison create 
very, very dangerous incentives with respect to equity. The best 
way to Improve a school's average test score is to make sure the 
people who score poorly on the test don't come through the 
school. If funding is linked to test scores, schools vlll have 
incentives to push kids out rather than have them take tests on 
which they may do poorly. Incentives are created to reduce 
integration. To keep rich and poor students apart. To keep 
black and white students apart. To gain rewards, teachers and 
administrators will have to seek out schools like Lake Wobegon, 
where their value can be corroborated by the test scores of their 
students. The incentive is to affiliate with the students who 
need the least teaching, thus exacerbating what already tends to 
occur in the distribution of students (and resources to serve 
them) in this country. This situation is already embedded in 
policies that affect higher education in several states. But if 
we want to provide the public with real accountability, we have 
to attend to the distribution of resources and the quality of 
education — not the distribution of test scores. 

To do this, we need to broaden the notion of what is a 
legitimate indicator of learning, looking not only at student 
test scores but at written and other types of performance as 
well; at student projects that demonstrate the ability to 
conceive an idea, work through problems, and produce solutions; 
at performances in journalism, the arts, debating, play writing; 
at research and demonstration projects in the natural and social 
sciences. Measures will have to become more complicated, 
involving real observations and judgments by teachers and 
administrators of what has been learned and accomplished. More 
external review, by faculties looking at each other's students, 
is called for — as is done in Europe. This is not to say that 
some paper-and-pencll test, even entailing standard questions, 
might not be appropriate. But let's think about formats other 
than multiple choice, and results that aren't automatically 
reduced to single numerical scores. Let's develop indicators of 
real performance of actual goals, rather than artificial measures 
of discrete sub-objectives that bear questionable relationship to 
the actual learning we value in our students. 

The time is here for a new day in American assessment. And 
there are a lot of creative activities going on in a number of 
your institutions that can lead the way toward assessment that 
serves the goals of both accountability and quality education. A 

13 

3S 



nev tradition of assessment requires us to be clear about what 
rewards and sanctions we're creating and how they relate to what 
we value and how we teach, to long-term learning and to equality 
of opportunity. Educationally sound assessment will mean paying 
attention to far more than the technical aspects of data 
collection and manipulation. It means allowing for relevance and 
coherence in each institution. 

It also means paying attention to side effects, because 
whatever assessment schemes you adopt, your choices will have 
side effects. Watch for them, monitor them, and continually seek 
improvements. Educate the legislature if that is necessary. 
Educate those who are engaged with you in monitoring the 
assesment process. This is an extremely important activity. It 
can be extremely valuable. But if poorly done, it can be 
extremely harmful. 

With the marriage of policy making and assessment, the work 
of educators and educational researchers has become more critical 
than ever before. We must insist on intellectual honest' and on 
educational validity. We must educate those who would impose 
hasty or inadequate methods. We must stand up for students, for 
academic quality and equality, and for a humane educational 
system. Professionals like yourselves who are rising to this 
challenge will make a profound and constructive contribution to 
American education. 




THE ASSESSMENT MOVEMENTi 
WHATNEXT? WHO CARES? 



An address by: 

Robert H. McCabe 
Miami-Dade Community 
College 



Presented at: 

The AAHE Assessment Forum 
Third National Conference on Assessment 
in Higher Education 
June 8-11, 1988 
Chicago, Illinois 



o 40 

ERIC 



Keynote Address 
American Association for Higher Education 
Third National Conference on Assessmt^nt in Higher Education 
Chicago, Illinois - June 9, 1988 

THE ASSESSMENT MOVEMENT; WHAT NEXT? WHO CARES? 

by 

Robert H, HcCabe, President 
Mlaml-^Daue Community College 

Hov has assessment reached the present high level of interest? 
What is going on in assessment? Where is the thrust coming from, and 

Access and Standards 

As background, it is relevant to review some of the events of Amcri- 
cHTi higher education in the 60s and 70s, The emergence of community 
colleges with an open door philosophy, fewer colleges and universities 
being really selective » and an increased interest by individuals in 
postsecondary education have resulted in a greater diversity of stu* 
dents • 

There was substantial change in higher education in the 1960s. I 
left Miami in 196^ and went to New Jersey to open a new community col- 
lege la downtown Newark. It was a great challenge and a good exparience 
for mer and it calls to mind a key feature of t^ . early 60s — it was a 
period of great optimism and stress on all system^). Americans were 
committed to prc^reso in civil rights and believed that there would be 
rapid prrgres8» that all of the shortcomings and the deprivation for 
minority groups were going to be overcome — and quickly. There was an 



41 



expectation of rapid advancement for minorities and other groups who had 
not been well served. In higher education, access was the message. 

In community colleges we talked of "the right to fall." It was a 
"do-your-own-thlng" period In which we believed that students knew more 
about what they should or could do than we did. And, therefore, we gave 
them permission to take any courses that they wanted. It might work 
out; If xt didn't they had their "right to fall." In that process, and 
In the process of expanding access, we lowered expectations. I learned 
that very well when a faculty member In a campus outreach center told 
me, "I just can't ask these students under these circumstances, and with 
their personal commitments, for the same things I would ask students on 
campus." Across all of higher education, and particularly In open-door 
Institutions, we did lower expectations. Despite the long term negative 
Impact on higher education. In the context of those times, I would do 
the same thing I did In Newark In 1965 all over again. It was really 
Important to get some of those Individuals previously unserved through 
the system. 

One major current problem Is that many In education appear to have 
premltted their attitudes to become "stuck" In the 1960s. We haven't 
moved on as we should have from the emphasis on access to that of access 
with quality, the undeniable need for American society today. 

At the same time these things were happening In higher education, 
something similar was happening In the schools, and the evidence of 
decllus In the skills that students have as they graduate from oar 
schools Is very substantial. 

That problem has been exacerbated by a rising literacy require- 
ment. From the close of the Second World War to the beginning of the 



19808 9 the number of unskilled and semi-skilled Jobs dropped from 80 to 
20 percent* ThuSf it is absolutely essential that larger percentages of 
Americans develop higher-level information skills so that they can par- 
ticipate effectively in society. I remember that when I graduated from 
high school with nine other individuals, I was the only one who vent on 
to college, but there were Jobs for all of the others. Today, that 
would not be the case. There are very few opportunities for individuals 
who do not have solid information skills. 

With nearly two-thirds of high school graduates entering 
postsecondary programs within five years of high school graduation, and 
the growing mnnber of high school graduates who are academically defi- 
cient, colleges design their programs to deal with a dramatically more 
diverse student body. They must also be prepared for a task new to 
many — helping underprepared students to academic success rather than 
weeding them out. In my opinion, the mismatch between the information 
skills of young Americans and those necessary to function effectively in 
society is the greatest threat to the well-being of the country, this 
side of a nuclear holocaust. Higher education has an important part in 
rectifying this situation by helping individuals who have completed high 
school, but do not have sufficient information skills, so that they are 
"salvaged." Assessment can contribute significantly to higher education 
efforts to address this problem. 
The Public Call for Accountability 

In recent years, the public has became increasingly critical of 
K-12 education and now higher education. There is a substantial amount 
of mistrust of teachers, and a growing dissatisfaction with the tremen- 
dous expenditure and seemingly limited results from our institutions. 

ERIC - 3 - 



The public wants assurance that something positive Is happenlr*;. This 
Is being reflected at the state level where legislators and state offi- 
cials are telling educators, "We want you to perform In certain ways. 
To ensure that, we v/ill regula^-c you more, and we want proof of perfor- 
mance." I remember In the 1960s when we would go to the legislature for 
appropriation. Our case was simple: "If you love your children, you 
should give us more money." And, legislatures responded with more mon- 
ey. Now there are discussions of relative merit and where limited re- 
sources should be spent. Ninety billion dollars goes into higher educa- 
tion annually in this couiitry, and the public wants to know whether it 
is being well spent. And, as Independent colleges and universities 
begin to draw more public funds, state and federal, they will not be 
exempt. 

The response to recognition of unsatisfactory educational quality 
is coming from the public through legislatures and from institutions — 
both Involve assessment. However, they think of assessment rather dif- 
ferently. At the ctate level, the assessment movement started as a 
standardized test movement with the K-I2 system. Greg Anrig, the presi- 
dent of the Educational Testing Service, cites an old army saying: "If 
it moves salute it," and a current trend among state agencies is: "If it 
moves, test it." In fact, there are 24 states in which there is a test 
as a condition for graduation from high school. It is not something 
that is going to go away. There is more and more use of assessment 
through standardized tests occurring throughout the educational system — 
some controlling individual student progress, some focusing on the per- 
formance of the institutions. In the 1980s, this trend that began in 
K-12 is moving into higher education, and particularly in open-door 

-A- 44 



Institutions. Community colleges are certainly the most vulnerable to 
poorly conceived programs because these institutions have the most di- 
verse student bodies. In the push for quality, many question the abili- 
ty of these institutions to achieve a quality outcome with so many 
underprepared among their students. Others question the efficacy of 
expenditures for second and third chances for students vhom they believe 
had their chance in high school. What those of us in the open door 
colleges learn is that the fact that an individual does not have suffi- 
cient academic competence does not mean there is necessarily lack of 
talent, or that the talent cannot be developed. 
State Initiatives in Assessment 

Two-thirds of states now require some type of assessment in public 
colleges and universities. There are two primary issues at the state 
level. One is whether the assessment if going to be of the student, 
that is, as a condition for the student to move from one level to the 
next, or whether its purpose is to evaluate or to give feedback for 
program improvement. The second is whether the state designs and speci- 
fies the assessment program, or whether institutions are permitted to 
devise their own programs. 

It appears that there is a strong trend among state authorities to 
require assessment, but to consider the design and conduct of assessment 
a matter of institutional prerogative. There are very good reasons why 
that should be the case. However, as I meet legislators, governors, and 
state officials, they express a real sense of impatience — a feeling that 
institutions will not do anything unless the state forces them to do 
so. Therefore, while most state officials are saying, "Assessment is an 
activity in which institutions should take the lead, there is a deep 



suspicion that the Institutions aren't going to do It — we're going to 
have to do It ourselves." And, when It comes to educational program, 
legislators and state officials are much more willing to do It them- 
selves, as they have through a growing array of rules affecting educa- 
tion. Including specifications on how certain courses are to be taught, 
and what the content of the courses will be. We are seeing much more 
regulation and movement to assessment based on a growing mistrust of 
college educators. 

Not surprisingly, the states rely heavily on standardized tests. 
My feeling Is that the less one knows about standardized tests, the more 
willing one Is to rely on them. (Which Is not to say that I don't see 
their value.) The more one knows, the more one Is aware of the limita- 
tions. The less one knows, the more appeal of "sclantlflc" measurement 
that can be objectively expressed. State officials turn to standardized 
tests because they are objective, with numbers that can be used for 
comparisons or standards. They provide an easy way to look at reading, 
writing, and mathematics skills, which Is where much of the state thrust 
Is aimed. 

What concerns me most Is the use of standardized tests by states as 
a single measure of performance. It Is one thing for the states to say, 
''We are giving you tremendous amounts of money — ^we want Information to 
Indicate whether you are being successful." It Is quite another for the 
state to mandate a particular test or tests to determine when they will 
be given, who will administer them, and what will happen with the re- 
sults. 





- 6 - 



The Florida Experience 

Let me turn to my own state 9 Florida, as an example of the extreme 
side of the state-based, standardized tect-orlented approach to assess^ 
ment. I am going to be very critical, but I want to say up front that 
the project has had some positive Impact. There Is no doubt that the 
Florida higher education Institutions are more concerned about student 
Information skills, and there Is more effort In placement and 
remediation. However, on the whole. It Is my opinion that the Florida 
approach Is not the right one. It should be modified In Florida and not 
copied In other states. 

Let's begin with the first part of the Florida program, a state- 
wide, standardized test for placement. Legislators are comfortable with 
this requirement because It allows them to say: *'Okay, now we have 
uniform data. We know whether students are really academically defi- 
cient or not." Without doubt, mistrust Is part of the appeal of the 
statewide placement testing as It provides a standard which won't allow 
Institutions to "cheat." Cheat, that Is, with regard how they are fund- 
ed and how they report enrollment. This approach has two major disadvan- 
tages. First of all, each of our Institutions has a different curricu- 
lum — and we certainly want to keep It that way. The use of a placement 
test should be tied to the curriculum and Its special features. For 
example, at my Institution, our approach to placement (which we were 
doing before the state mandated It) was to determine placement only 
after we examined the relationship between performance on the test and 
performance In college general education courses. That Is, we based 
placement on what we knew about student performance In our curriculum. 
Second, once students were placed, the faculty could adjust placement 

ER?C -7-47 



based on classroom performance. For example. In vrrltlng courses, on the 
first day of class students wrote an essay, and on that basis faculty 
could move them to a more appropriate class. 

The point Is that placement testing Is more likely to benefit stu- 
dents when It Is designed and administered by the Institution. It 
should be tied to the curriculum; Its primary purpose should not be re- 
porting to the state, but Improving the growth and development of stu- 
dents. The college has the advantage of using faculty judgment and 
classroom performance for classroom placement, and can follow up on the 
programs of placement >ind remediation to determine If they have the 
Intended effect. 

The next piece In the Florida program Is the rising- junior test: 
the College Level Academic Skills Test (CLAST) , required of all students 
In public Institutions (or In Independent Institutions receiving state 
aid) for entrance to upper level coursework or the award of an associate 
In arts degree. This Is the most controversial component of the testing 
program. The CLAST experience shows the concerns that all of us shotild 
have about standardized rising-junior examinations. One concern Is that 
It drives curriculum. In the case of this examination, which was de- 
signed by faculty from across the state, there are five subdlsclpllnes 
In the mathematics test. Therefore, the colleges have had to redesign 
the curriculum in order to align with that particular set of mathematics 
competencies. It appears that many of those competencies are not needed 
by many of the students, but their curriculums by necessity Include a 
sequence of courses to learn these competencies. A similar problem 
occurs in the area of composition. In my institut;lon, half of our 
45,000 credit students have a native language other than English. You 

ErJc . 8 . 48 



can imagine the task of getting these students through the composition 
section of CLAST — particularly as it is timed, and it may be on a topic 
with which they have no familiarity. It makes it very tempting (and I 
heard this proposed on one of our campuses) to turn the English curricu- 
lum into a program to teach students how to produce a credible essay 
within extreme time constraints — simply test preparation. And this 
flies in the face of what most faculty tell me they should be teaching. 
They tell me that they teach students to organize ideas, to outline, to 
revise, and to use dictionaries. The need to help students pass CLAST 
drives us toward a curriculum that the faculty do not support. This is 
certainly not in the best interests of students. Further, the increase 
of English and mathematics enrollment is squeezing students out of sopho- 
more level courses in the humanities, sciences, and social sciences. 
CLAST is 1 9^ nning to dominate curriculum. 

Host important is the unquestioned use of CLAST as an independent 
criterion of student success. Is a statewide standardized test a valid 
predictor of studerts* ability to succeed? The fact is, there are large 
numbers of students who did not pass CLAST and proceeded to the upper 
division (when that was permitted), and are performing well at the ju- 
nior and senior year. Studies suggest that the best forecaster of suc- 
cess is a combination of grades and a standardized test, the next best 
is grades, and the least effective is a standardized test by itself. 

CLAST is having a particularly devastating impact on minorities. 
The number proceeding to upper division is in sharp decline, and our 
data suggest that many of those could be successful. So why not consid- 
er grades along with test scores? One must ask, in a country where 
there is a severe problem with the small number of minorities advancing 




- 9 - 




through each level of education, why would a program be utilized that is 
cutting out many of those who could succeed from proceeding through 
baccalaureate programs. It simply doesn't make any sense. 

Unfortunately, public relations plays an important part in state- 
wide test programs. It has Impact on the public image of the institu- 
tions, and can result in decisions based on that rather than what is 
good for students. I know of two institutions in Florida that are 
screening students before they take the CLAST. So, in fact, only those 
students whom they are convinced will pass take the test. The result 
when the newspaper article comes out and ranks institutions by results, 
those two institutions look good. But is that good for students? How 
about the student who has a good, though not sure, shot at passing? 
That student ought to have the opportunity to take the test. The ques- 
tion shouldn't be "what's going to give us the best public image," but 
"what is in the best interest of the studei^t?" A year ago the legisla- 
ture passed a rule which permitted students to take the CLAST at any 
time. That made sense as students who had the competencies at admission 
or early in their student careers, could be exempted for the CLAST cur- 
riculum sequence and permitted to enroll in a richer curriculum. One of 
my colleagues argued against this. •'Wait a minute," he said, "if we do 
that, the pass percentages on CLAST for my institution aren't going to 
look as good." And frankly, when institutions are ranked in the newspa- 
per four times a year on the basis of scores, it is hard not to think 
that way. It is a serious problem. 

And so I hope we can make some changes in the Florida program, not 
do away with it, but make changes that will make it more beneficial to 
students. We need to keep in view two things that distinguish American 

1^ -10- 50 



higher education from other systems. One Is Institutional and currlcu- 
lar diversity, which Is threatened by the Imposition of a standardized 
test; and the other Is giving second and third chances to students, 
which Is threatened by making judgments about students on the basis of a 
single criterion — a standardized test. 

Finally, we come to an Issue that underlies all the others— the 
continuing thrust toward centralization. Everything I read about manage- 
ment Indicates the importance of making decisions as close to the action 
as possible, and Involving people In decisions that Impact them. What 
Florida Is doing In assessment runs contrary to good management prac- 
tice. The program does not tie to the curriculum. It Is not designed 
to give feedback to students, and It bypasses faculty. Basing decisions 
on a single statewide measure sets aside all of the expert judgment that 
faculty have exercised over the years In the classroom. And I think In 
some cases It can force students Into poor practices. Florida has taken 
leadership In utilization of statewide assessment, and much has been 
learned— positive and negative. Now we need to utilize our experience 
and take responsibility to design an assessment program that helps stu- 
dents learn and grow. At the same time, we must recognize that the 
public has every reason to expect us to do more and to show the results 
of our efforts. 

Institutional Assessment Initiatives 

I would like to turn now to the other aspect of what Is happening 
In assessment — Institutional Initiatives . They have great potential . 
Much of the Impetus here comes from Institutions looking at themselves, 
their programs, and results, and asking, "Do we really know what we're 





trying to achieve?", and "Have we got any way of determining whether we 
are being successful?" 

There may have been a time when one of the roles of universities 
and colleges was to screen out students who were not prepared for Immedi- 
ate success. Full responsibility was on the student. We all know the 
stories of the dean addressing freshmen at orientation and saying, 
"Those of you who don't find the library by the end of the first foot- 
ball game won't be here by Thanksgiving." Underlying this attitude uas 
an assumption that entering students had been well-prepared and should 
be able to take care of themselves. I think that approach Is history, 
for all but a few Institutions. When we look at the students who are 
entering colleges today, and consider the needs of the society for In- 
creased ninnbers of people who have strong Information (academic) skills. 
It Is clear that Institutions must take greater responsibility for stu- 
dent success. The job Is no longer to screen out, but to help more 
students. Including the underprepared , to quality academic performance. 
Placement and program guidance a**e now Important concerns, so there Is 
good reason to develop assessment programs. Not only does the public 
have a right to know how well we are doing. It Is vital to students that 
we know more about them. The Idea that students either get It or get 
out just Isn't appropriate In the face of massive underpreparatlon and 
growing diversity. This nation needs to develop all of Its talent, and 
the fact that an Individual Is not well-prepared academically at one 
point In life does not mean that he or she has no talent or potential. 
Assessment can help us tap that talent and potential. 

For assessment to help It must Involve faculty. When we assess the 
effectiveness of the curriculum, the faculty iLUSt participate In deter- 




mining those desired program outcomes. There must be overall purpose 
and unity to the curriculum. The major hurdle is for faculty to under- 
stand that the courses they teach are not entirely independent. That is 
going to be very difficult to achieve, but it is essential if assessment 
is going to make any difference in improving student programs. The key 
is to understand that the focus is on the student and vhat the courses 
can contribute to his or her development. Moreover, I would argue that 
we have some obligation to assess teaching itself. Colleges have the 
reverse of the problem in K-12, where the concern is that the faculty do 
not have adequate mastery of their subject matter. In higher education 
faculty are well grounded in their disciplines, but they don't really 
know a great deal about teaching or learning. This is particularly 
striking when considered in light of the significant body of knowledge 
about teaching and learning that has accumulated in the past thirty 
years — knowledge that is not a part of the • 'bulary of our faculty. 
Most come to college teaching with a love for their subject matter, and 
a love for the kind of work they did in graduate school. But teaching 
is something they learn on the job or not at all. A medical analogy is 
suggested. If I needed heart surgery, I would look for a doctor who 
knew more about the heart than anybody else in the world. But if that 
person had never had a scalpel in hand, I'm not sure I'd want to be 
operated on. And to a great extent, this is the situation with facul- 
ty. They have wonderful knowledge about their special field, but little 
knowledge of "the operation" — teaching and learning. College teaching 
is one of the few professions where we don't stand on the shoulders of 
those who went before, learning from them, so each generation improves. 
That really needs to change, especially as institutions and faculty take 



greater responsibility for student performance. It is time tc come to 
grips with vhat is expected in faculty performance, and to think about 
that in light of what assessment can tell us about student learning. We 
need to help faculty become better teachers, and we need to help them 
assess how well they succeed. 

Related to this is shifting concern to our output instead of in- 
put. What do colleges brag about? The SAT scores of incoming students — 
how many students came with this or that level of ability. Most have 
little to say about how students grew or what they can do as a result of 
our programs. I was at a meeting of state leaders in Florida. They 
were discussing a suggestion that our goal should be to improve the 
quality of higher education in Florida. I stated my belief that this 
was an inappropriate goal. Why? Because it would Iz very easy to im- 
prove the quality of higher education by simply reducing the number of 
students admitted by screening out the less well-prepared. Presto! The 
equality of higher education would be enhanced. But the goal must be 
for education to be more successful in meeting ^he needs of society. If 
you put it that way, you come up with a very different kind of solu- 
tion. Improving quality involves expanding the number of students gain- 
ing academic achievement. To realize this goal, colleges should assess 
the effects of educational programs on students and develop methods to 
get that information back to students in ways that help them know how 
they are doirg, what they can do better, and what their next step needs 
to be. The same information needs to be available to each teaching 
faculty member as a basis for improving teaching. 

This, it seems to me, is what Pat Cross has been talking about— 
classroom research. I am very impressed with the potential at the level 



-14- 54 



of assessment she advocates — assessment that gets down to what each 
faculty member does with students-^the place where the real learning 
rakes place* 
Siimmary 

Where are we and what has happened with assessment? I have comment-* 
ed about state Initiatives and about institutional Initiatives, and some 
of the issues that arise in the tension between those initiatives. 
Should assessment be the responsibility of the state or each institu-* 
tion? Is it to make determinations about individual students or to 
evaluate tha institution? Is it t ^r both? Should it be standard across 
institutions 9 or diverse? How should the results be utilised? 

Where do we go from here? There is little doubt that this assess** 
ment movement is going to grow* There is no doubt that the interest of 
legislators and the public is increasing. And I am encouraged by the 
growth of interest that I see within institutions, particularly geared 
to student learning and student growth. Hopefully that is where the 
most energetic assessment efforts will occur. Assessment should be an 
ongoing vehicle for self-awareness and change. 

The assessment movement is growing in tandem with the ceach-* 
ing/leaming movement. In fact, it could be considered an element of 
it. There is tremendous interest in improving teaching and doing mors 
to improve to ii;«prove student learning. Assessment is essential to 
thr^t, whether it be classroom research or program assessment. The most 
promising assessment programs now in progress deal with the impact of 
teaching on student learning and feedback to students and faculty. Lee 
Shulman*8 work comes to mine » as does the program at Alvemo College* 
What I am suggesting is that the future of assessment will be in improv-* 




Ing student development through more effective teaching and learning. 
In the 19608 community colleges operated on the basis that students had 
a "right to fall." We should not operate on the basis of their having a 
"right to succeed 9" and assessment can contribute to that success. 



ADDITIONAL RESOURCES 

AAHE ASSESSMENT FORUM 



The following resources are available for purchase from the AAHE Assessment Forum: 

L Resource Packet L FivePapero $15.00 
-"Assessment, Accountability, and Improvement: Managing the 
Contradiction," P. Ewell 

-'Assessment and Outcomes Measurement: A View from the 
States," C. Boyer, P. Ewell, J. Finney, and J. Mingle 
-"The External Examiner Approach to Assessment," B. Fong 
-"Six Stories: Implementing Successful Assessment," P. 
Hutchings 

-"Thinking About Assessment: Perspectives for Presidents and 
Chief Academic Officers," E. El-Khawas and J. Rossmann 

2. Resource Packet U: SizPapers $25.00 
-"Acting Out State*Mandated Assessment: Evidence from Five 
States," C. Boyer and P. Ewell 

-"Assessing Student Learning in Light of How Students Learn," J. 
Novak and D. Ridley 

-"Faculty Voices on Assessment: Expanding the Conversation," 
P. Hutchings and E. Reuben 

-"Feedback in the Classroom: Making Assessment Matter," K 
Patricia Cross 

-"Standardized Tests and the Purposes of Assessment," J. 
HefTernan 

-"An Update on Assessment," (AAHE Bulletin, December, 1987), P. 
Hutchings and T. Marchese 

aThreePre6entations-1987: $8.00 

from the 2nd National Conference on Assessment in Higher Education 
-Lee S. ShUman - "Assessing Content and Process: 
Challenges for the New Assessments" 
-Virginia B. Smith - "In the Eye of the Beholder: 
Perspectives on Quality" 

-Donald M. Stewart - "The Ethics of Assessment" 

4. Iliree P^n^entations-ldSS: $10.00 
from the 3rd National Conference on Assessment in Higher Education 
-Alexander Astin - "Assessment and Human Values: 
Confessions of a Reformed Niimber Cruncher" 
-Linda Darling*Hammond - "Assessment and Incentives: 
The Mediiun is the Message" 
-Robert H. McCabe - "The Assessment Movement: 
What Next? Who Cares?" 

6. Assessment Programs and Fleets: A Directory $10.00 
Concise descriptions of thirty assessment projects being 
implemented on campuses across the country. Edited by Jacqueline 
Paskow. 

To order items indicated above, contact: Patricia Hutchings. Director, AAHE Assessment 
Forum, One Dupont Circle, Suite 600, Washington, DC 20036; 202/293-6440. Orders under 
$25 must be prepaid Allow four weeks for delivery. MAKE CHECKS PAYABLE TO AAHE 
ASSESSMENT FORUM. 5? 



